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S. 


INTRODUCTION 


The  contents  of  this  Technical  Report  was  prepared  for  publication  in 
Technometrics  to  accompany  papers  by  Gwilym  Jenkins  (l)  and  Emanuel  Parzen  (2) 
which  formed  the  core  of  a  discussion  of  spectral  analysis  of  time  series  for 
statisticians  at  the  Stanford  meeting  in  August  i960.  The  present  account  is 
based  on  an  oral  contribution  to  that  discussion,  but  extends  the  treatment  in 
a  number  of  directions . 

It  is  hoped  that  it  will  serve  as  a  sueful  introduction  to  statisticians 
wishing  to  acquire  a  better  understanding  of  spectral  analysis. 

John  Tukey 


(1)  Gwilym  M.  Jenkins  1961,  "General  Considerations  In  the  Analysis  of  Spectra", 
to  appear  in  Technometrics . 

(2)  Emanuel  Parzen  1961,  "Mathematical  Considerations  in  the  Estimation  of 
Spectra",  to  appear  in  Technometrics. 
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This  session  was  to  be  expository  and  to  be 
directed  to  statisticians.  Accordingly,  the  discussants 
have  a  responsibility  to  provide  such  comments  as  may  tend 
to  make  both  the  two  papers  and  the  general  subject  more 
understandable  to  statisticians,  particularly  by  relating 
spectrum  analysis  to  statistical  techniques  and  to  fields 
of  application  more  widely  familiar  to  them.  Fortunately, 
the  connection  between  spectrum  analysis  and  those  aspects 
of  the  analysis  of  variance  which  emphasize  variance  com¬ 
ponents  is  extremely  close. 

One  essential  in  this  close  connection  is  that, 
as  emphasized  by  Jenkins,  all  practical  time  series  problems 
can  be  treated  as  if  time  were  discrete  and  the  available 
data  came  at  equally-spaced  intervals.  Since  most  problems 
can  also  be  treated  as  if  time  were  continuous,  there  will 
be  little  need  for  us  to  distinguish  continuous  time  from 
equi-spaced  discrete  time.  When  we  come  to  computation, 
time  always  can,  and  usually  will,  be  discrete. 

To  make  this  connection  evident,  however,  we 
shall  have  to  analyze  the  implications  and  foundations  of 
our  procedures  and  thinking  in  classical  analysis  of 
variance  more  deeply  than  usual.  It  is  fair  to  say  that 
the  spectrum  analysis  of  a  single  time  series  is  just  a 
branch  of  variance  component  analysis,  but  only  if  one 
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describes  its  main  difference  from  the  classical  branches 
as  a  requirement  for  explicit  recognition  of  what  is  being 
done  and  why.  In  classical  (i.e.  single-response  analysis- 
of -variance)  variance  component  analysis,  one  can  (and 
most  of  us  do)  analyze  data  quite  freely  and  under standingly 
with  little  thought  about  what  is  being  done  and  why  it  is 
being  done.  This  is,  perhaps  unfortunately,  not  the  case 
for  the  time  series  analysis  branch  of  variance  component 
analysis . 


I 

VARIANCE  COMPONENTS  AND  SPECTRUM  ANALYSIS 

When  variance  components? 

When  conducting  analyses  of  data  in  conventional 
analysis-of-variance  patterns,  we  sometimes  pay  attention 
to  individual  values  of  main  effects,  interactions,  and 
the  like.  At  other  times,  we  pay  attention  to  estimates 
of  variance  components.  The  controlling  factor  in  this 
choice  is  the  character  of  the  sets  of  data  which  would  be 
considered  to  be  other  realizations  of  the  same  experiment 
(or  of  the  same  patterned  observation).  Thus,  if  we  were 
comparing  the  times  taken  by  the  five  outstanding  runners 
of  the  world  to  run  1500  meters,  another  realization  of 
the  experiment  would  reasonably  involve  the  same  runners. 
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and  it  would  be  appropriate  to  pay  attention  to  individual 
main  effects.  If,  however,  we  were  considering  the  speeds 
for  a  standard  assembly  operation  as  shown  by  five 
assemblers  drawn  at  random  from  a  pool  of  250  assemblers 
in  a  large  factory,  another  realization  of  the  same 
experiment  would  almost  certainly  involve  a  different  group 
of  assemblers,  bince  our  concern  would  have  been  with 
assemblers  as  a  whole,  rather  than  with  5  particular 
assemblers.  Consequently,  in  analyzing  such  data,  we 
would  pay  attention  to  the  estimated  variance  component 
for  assemblers.  (We  are  here  concerned  with  the  direct 
issue  of  what  aspect  of  the  classification  concerned  re¬ 
ceives  attention,  not  with  the  indirect,  but  perhaps 
equally  Important,  issue  of  how  the  character  of  this 
classification  affects  the  proper  error  term  for  other 
main  effects  --  the  question  sometimes  discussed  in  terms 
of  "fixed,  mixed,  or  random  models".)  There  is  a  clear 
analog  to  this  choice  in  the  Fourier-oriented  analysis 
of  time  series. 

Let  us  first  consider  the  case  of  a  function  of 
time  which  is  periodic  with  known  period.  If  we  may  choose 
the  time  unit  for  convenience,  the  period  may  as  well  be 
2tt,  and  the  function  will  then  have  (in  practice)  a  Fourier 
series  representation  of  the  form 


y(t)  =  a0  +  2j(aj  cos  jt  +  bj  sin  jt) 
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Let  us  lay  aside  for  the  moment  questions  of  errors  of 
measurement,  numbers  of  (and  spacings  between)  times  at 
which  observations  are  made,  and  whether  j  has  a  finite 
or  infinite  range.  Since  we  are  statisticians,  concerned 
with  a  statistical  problem,  the  coefficients  a  ,a^,b^,a2,b2, . . . 
are  not  to  be  thought  of  as  constant,  but  rather  as  having 
some  joint  distribution.  This  joint  distribution  reflects 
the  functions  corresponding  to  "all  the  realizations"  of 
the  same  experiment  or  observational  program.  At  one  ex¬ 
treme,  the  functions  of  time  representing  different  reali¬ 
zations  might  all  be  very  nearly  the  same.  If  this  is  the 
case,  then,  given  a  single  realization,  it  is  clearly  ap¬ 
propriate  to  concentrate  our  attention  upon  the  estimated 
values  of  aQ,  a^,b^,  a2,b2,  .  .  .  This  is,  of  course,  the 
situation  envisaged  in  classical  harmonic  analysis.  One 
opposite  extreme,  one  which  you  may  claim  only  a  statistician 
would  think  of,  occurs  when  there  are  parameters  a^,  a^,  CT^, . . . 

and  the  a's  and  b's  are  Independent  normal  deviates  with 

2  9 

ave  aQ  =  ave  a^  =  ave  b^  =  0,  var  aQ  =  aQ,  var  aj  =  var  bj  =  Cj/2. 
Given  one  realization  of  such  an  experiment,  it  is  only  reason¬ 
able  to  look  at  quadratic  functions  of  the  observations,  and 

2  2  2 

to  regard  them  as  telling  us  about  aQ,  a-|_>  o2>  •  •  •  •  Speci¬ 
fically  it  is  appropriate  to  look  at  a^,a^  +  b^,a2>  b2  ,... 
and  at  certain  linear  combinations  of  these  quantities.  In 
contrast  to  classical  harmonic  analysis,  this  sort  of  periodic- 
time-function  problem  is  a  variance  component  problem 
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The  model  which  lies  behind  the  classical  tests  of 
significance  in  harmonic  analysis,  a  line  of  development 
finally  completed  by  Fisher  [1929],  is  an  incomplete  mixture 

i 

of  the  two  we  have  just  described,  in  which 

^  observed^)  -  ^fixed^^  +  ^randon/ ^  * 


In  this  decomposition  the  "fixed"  component  is  usually 
thought  of  as  involving  only  one,  two,  or  perhaps  three 
values  of  j,  while,  both  most  importantly  and  most  danger¬ 
ously,  the  "random"  component  is  thought  of  as  having 

2  _  2  2  2 

O  O  •••  (J  •••  O  • 

1  2  j 

2 

Equality  of  the  >  the  analog  for  periodic 
functions  of  being  a  "white  noise",  is  exactly  what  would 
hold  if  the  "random"  component  consisted  only  of  independent 
(or  merely  uncorrelated)  observational  errors  in  observations 
equally  spaced  through  (0,  2tt)  .  It  is  also,  unfortunately, 
exactly  what  is  most  unlikely  to  occur  in  practice  (for 
reasons  to  be  discussed  in  a  moment) .  As  a  consequence, 
the  practical  applications  of  such  "largest  value  against 
all  the  rest"  tests  of  significance  in  harmonic  analysis  is. 


to  say  the  least,  extremely  limited.  (if  only  our  estimates, 

2  2  2 

a^  +  bj,  of  cfj  had  more  than  two  degrees  of  freedom,  we  could 
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improve  the  classical  tests  of  significance  by  fitting  some 

2 

sort  of  reasonable  dependence  of  aj  upon  j,  before  proceeding 
to  the  construction  of  a  significance  test.  Even  with  only 
two  degrees  of  freedom,  some  such  replacement  may  be  possible.) 

Thus,  even  in  the  case  of  periodic  time  functions, 
we  have  some  situations  which  should  be  treated  almost  en¬ 
tirely  in  terms  of  means,  others  which  should  be  treated 
entirely  in  terms  of  variance  components ,  and  still  others 
where  both  descriptions  should  be  used  together. 

The  character  of  time 

Time  is  connected.  And  functions  of  time  reflect 
this  fact  in  their  structure,  not  only  in  the  tendency 
toward  continuity  shown  by  individual  time  functions,  but 
even  more  obviously  in  the  associated  probability  structures. 
When  a  time  function  is  wisely  regarded  as  generated  from 
constituents  coming  from  different  sources,  as  most  are, 
the  individual  constituents  are  not  likely  to  be  "white 
noises."  (Not  even  the  measurement  error  constituent!) 

And,  even  more  crucially,  the  processes  by  which  these  con¬ 
stituents  are  combined  are  not  likely  to  treat  different 
frequencies  alike,  so  that  even  if  the  constituents  were 
white  noises,  their  resultant  would  not  be  one.  Both  in  the 
periodic  case  and  the  more  usual  and  general  case  of  a 
continuous  spectrum,  a  random  time  functions  is  rarely  a 
"white  noise" . 
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Another  characteristic  of  time  is  that  it  is 
quite  frequently  measured  from  an  arbitrary  origin.  To 
be  sure,  if  the  simple  periodic  case  has  an  annual  period, 
we  may  place  the  computing  origin  of  time  where  we  will, 
but  that  will  not  make  1  January  and  1  July  the  same. 

But  if  we  are  examing  the  harmonics  of  a  400-cycle 
electrical  voltage,  there  is  no  equally  necessary  or 
special  relation  between  local  time  and  400-cycle  time. 

In  a  repetition  of  the  same  experiment,  the  generator  phase 
at  zero  local  time  may  well  be  equally  likely  to  have  any 
value  between  0  and  2 ir.  And  if  this  is  so,  the  situation 
is  a  stationary  one.  (This  example  may  help  to  emphasize 
that  stationarity  is  a  condition  "across  the  ensemble",  a 
condition  relating  one  realization  to  another,  a  condition 
on  a  whole  ensemble,  that  it  is  not  a  condition  on  single 
realizations,  and,  most  specifically,  is  not  a  condition 
of  steadiness  within  individual  realizations.) 

Finally,  phenomena  in  time  are  rarely  periodic, 
(in  fact,  when  examined  under  a  microscope,  no  known 
phenomenon  is  precisely  periodic.)  Consequently,  an 
effective  Fourier  description  of  real  phenomena  can  rarely 
be  a  periodic  description.  We  must  allow  all  frequencies 
to  contribute,  and  hence,  as  Jenkins  has  explained,  must 
turn  to  a  continuous  spectrum. 
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The  statistically  vital  contrast  between  situations 
appropriately  describable  by  means  and  situations  appropri¬ 
ately  des cribable  by  variances  continues  here,  as  we  should 
have  expected.  The  motions  of  a  springboard  from  which  a 
diver  has  just  leaped  require  all  frequencies  for  their 
description.  The  motions  following  successive  leaps  by 
a  single  careful  and  precise  diver  will  be  relatively  similar. 
They  will,  as  a  whole,  probably  be  most  appropriately  described 
"by  means",  by  a  description  of  the  typical  time  history  of 
board  motion.  But  if  no  diver  is  present,  if  the  spring¬ 
board  is  vibrating  through  a  very  small  amplitude  because 
the  wind  is  blowing  on  the  board  and  its  supports,  and  because 
the  ground  itself  is  vibrating  because  of  vehicle  traffic  and 
factory  machinery,  the  situation  is  likely  to  be  quite  dif¬ 
ferent.  The  characteristics  of  this  "noise-like"  motion  of 
the  springboard  which  are  maintained  from  one  realization 
to  another  are  of  the  nature  of  variance  components  rather 
than  means.  And  of  course  (as  when  a  big  grasshopper  jumps 
off  a  small,  wind-and-traf fic-vibrated  springboard)  there 
are  intermediate  situations  whose  description  appropriately 
combines  both  means  and  variance  components. 

Which  variance  components? 

Discussion  has  proceeded,  up  to  this  point,  as 
though  the  statement  of  a  problem  automatically  fixed  a 
set  of  variance  components.  When  we  think  matters  over 
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carefully,  we  find  that  this  is  far  from  being  the  case. 

In  an  abstract  problem,  where  only  the  pattern  of  the 
observations  and  the  symmetries  of  their  distribution  are 
specified,  without  any  indication  of  their  interpretation 
or  understanding,  there  is  no  unique  set  of  variance  com¬ 
ponents.  Instead  there  are  many  sets,  each  interconvertible 
by  prescribable  formulas  into  each  other.  Abstractly,  the 
best  we  can  do  is  to  say  that  any  set  of  quantities  such 
that  each  of  the  second  moments  (pure  and  mixed)  of  the 
observations  can  be  expressed  as  a  linear  combination  of 
the  quantities  of  the  set  (together  with,  say,  the  square 
of  the  average  of  some  general  mean)  can  play  the  formal 
role  of  a  system  of  variance  components.  (if  the  quantities 
in  some  set  do  not  behave  like  variances  we  might  prefer 
to  call  them  (together  with  the  squared  average)  second- 
moment  components  rather  than  varianpe  components,  though  " 
we  shall  not  be  concerned  with  this  particular  precision 
of  language  here.)  Still  one  set  of  variance  components 
may  be  more  convenient,  and  far  more  useful,  than  another. 
Why? 

Replicated  double  classifications 

If  we  examine  one  of  the  most  classical  patterns, 
a  replicated  double  classification  into  rows  and  columns, 
we  can  learn  why.  Let  us,  then,  consider  a  classical 
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analysis  of  variance,  based  on  a  pattern  involving  d 
observations  in  each  of  the  r-c  cells  formed  by  crossing 
r  rows  with  c  columns.  The  analysis  of  variance  break¬ 
down  into  sums  of  squares,  degrees  of  freedom,  and  mean 
squares  is  standard,  as  are  the  definitions  of  variance 
components.  The  well-known  formulas  for  the  average  values 
of  mean  squares  are,  if  all  population  sizes  are  infinite: 

ave  {MS  j  rowsj  =  CT2  +  d.a^c  +  dc • 


ave  IMS  |  cols}  =  G2  +  d •  +  dr •  at 

KG  K 

ave  IMS  |  int}  =  a  2  +  d*G^c 

ave  IMS  |  dup}  =  G2 
2  2  2  2 

Why  did  we  choose  a  ,  o^q >  Oq  and  Gr  as  the  variance  com¬ 
ponents  in  terms  of  which  we  are  to  write  out  such  formulas? 
We  could  for  example,  have  used  as  variance  components  such 
average  values  of  differences  between  differently  related 
pairs  of  observations  as,  taking  i  ^  I,  j  J,  k  ^  K: 
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ave(yljk-yIjK)' 


ave  (  yi  jic_yI  JK) 


Before  trying  to  answer  these  questions  we  must  look  back 
at  some  of  the  implications  of  the  way  in  which  they  were 
asked. 

The  term  "variance  component"  can  be,  and  is, 
appropriately  used  in  two  different  senses.  These  senses 
differ  in  effect,  but  only  when  the  underlying  situations 
differ,  so  that  no  contradictions  arise.  When  the  under¬ 
lying  situation  is  such  that  it  is  appropriate  to  consider 
means  in  the  first  instance  (the  pigeonhole  model  of  Cornfield 
and  Tukey  1956  includes  such  extreme  examples),  variance 
components  are  means  over  more  specific  quadratic  quantities. 

In  particular,  the  within-cell  or  "duplication"  variance 

2 

component  a  is  the  average  of  the  variances  of  all  the 
cell  populations.  If  these  cell-population  variances  differ 
from  cell  to  cell,  so  too  do  the  values  of 


aVe(yijk-yijK> 


since  these  averages  will  always  be  twice  the  variance  of 
the  population  in  the  corresponding  cell. 
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When  the  underlying  situation  Is  at  the  other 
extreme,  so  that  only  variance  components  should  be  con¬ 
sidered,  then  the  labels  upon  the  rows  and  columns  can 
wisely  be  regarded  as  purely  arbitrary.  This  means  that 
if  the  same  "individual"  were  to  appear  as  a  row  in  each 
in  two  realizations  of  the  same  experiment,  the  numbers 
labeling  the  two  rows  would  be  quite  unrelated.  Such  lack 
of  relationship  could  be  in  the  nature  of  the  situation,  or 
could  have  been  enforced  by  our  insistence  on  a  randomiza¬ 
tion  of  the  row  numbers,  separately  for  each  realization, 
before  the  data  was  made  available  for  analysis.  But  if 
the  labels  are  arbitrary,  we  connot  think  of  one  cell, 
considered  by  itself,  as  different  from  another.  Similarly 
there  will  be  only  four  kinds  of  pairs  of  cells:  identical 
in  same  column  but  not  in  same  row;  in  same  row  but  not 
in  same  column;  in  different  rows  and  columns.  And  the 
four  corresponding  average  square  differences  would  have 
the  following  values; 


ave 


^yijk_yijK')  2a 


ave(yiJk-yiJK)2  =  2a2  +  202c  +  2a2 


ave(yijk  yIjK^  -  2a  +  2 aRC  +  20r 


ave(  y1jk-yIJK)  2a  +  2 aRC  +  20r  +  2( 
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Knowing  either  set  of  four  quantities,  either  the  4  average 
squared  differences,  or  ,  Oq»  and  the  other  set 

is  very  easily  calculated. 

Why  then  do  we  prefer  the  first  set,  since  they 
are  arithmetically  equivalent?  It  must  be  because  of  some 
matter  of  interpretation.  And  the  interpretation  must  in¬ 
volve  not  the  realizations  of  a  single  experiment  but  the 
comparison  of  two  or  more  different  experiments.  In  fact, 
we  feel  that,  for  example,  the  sort  of  change  of  circumstances 

2  p  p  p 

which  halves  or  doubles  an  while  leaving  0  ,  and  a 

b  RC  R 

unaffected  is  easier  to  understand  than  the  sort  which 

changes  ave( jk_yi JK)  without  affecting  its  three  fellows. 

The  prime  criterion  for  selecting  useful  variance  components 

is  that  we  should  be  more  easily  able  to  understand  the  changes 

in  the  situation  which  would  change  some  variance  components 

while  leaving  others  alone. 

Known-period  time  functions 

Let  us  now  consider  periodic  time  functions  with 

a  fixed  period  and  a  stationary  joint  distribution.  One 

variance  component  description  has  already  been  given  in 
2  2  2 

terms  of  <Jq,  a-^>  a2>  •  •  •  •  (Normality  is  a  matter  of  in¬ 
difference  to  us  in  the  present  instance.)  Another  can  be 
given  in  terms  of  Jowett's  serial  variance  function  [  Jowett  1955] 

vh  =  \  ave(y(t+h)  -  y(t))2 
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which,  on  account  of  stationarity,  must  be  the  same  for 
all  values  of  t.  The  formal  relations  between  these  two 
schemes  is  easily  found  to  be; 

Vh  =  2j(sin2  4!?)  •  CTj 

The  formal  similarities  between  the  two  pairs  of  mutally 
related  variance-component  schemes,  one  for  the  replicated 
two-way  table,  and  the  other  for  stationary  periodic  time 
series,  are  very  striking,  but  the  actual  similarities  go 
deeper. 

What  are  the  simplest  changes  which  we  can  con¬ 
template  making  in  a  situation  involving  stationary  periodic 
time  functions?  They  are  the  results  of  such  simple  linear 
operations  as  the  result  of  passing  an  electrical  voltage 
through  a  simple  circuit  consisting  of  resistances,  con¬ 
densers,  and  inductances,  or  the  result  of  passing  a 
mechanical  motion  through  a  simple  linkage  of  springs, 
masses,  and  dash  pots.  (Such  processes  occur,  in  particular, 
in  almost  every  physical  or  chemical  measuring  instrument.) 
Any  such  linear  process  will  affect  the  amplitude  and  phase 
of  each  harmonic  in  a  characteristic  way.  If  its  effect 
on  a  pure  jth  harmonic  would  be  to  multiply  amplitude  by 
|  Lj  |  ,  then  the  jth  variance  component  of  any  stationary 
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ensemble  of  periodic  time  series  (with  period  2tt)  will  be 
multiplied  by  |  Lj  |  =  LjL^.  There  is  no  correspondingly 

simple  result  for  the  serial  variance  function.  Consequently, 
the  frequency-related  variance  components  are  much  more  useful 
than  serial  variance  functions  in  dealing  with  stationary 
ensembles  of  fixed-period  time  functions. 

(in  highly  mathematical  language,  the  frequency 
variance  components  are  a  basis  for  second  moments  which 
simultaneously  diagonalize  the  effects  of  all  operations  that 
are  linear  and  time-shift  variant — all  black  boxes  on  the 
sense  of  pp.  xyz-uvw.) 

It  can  be  done  with  covariances.1 

The  discussion  just  given  stressed  the  analogy 
between  classical  analysis  of  variance  and  the  analysis 
of  stationary  periodic  time  series  by  using  averages  of 
squares  of  differences  of  observations  in  both  situations. 

It  would  have  been  possible  to  have  stressed  the  analogy 
almost  equally  to  have  used  covariances  in  both  situations. 

In  the  replicated  row-by-column  pattern,  we  have,  when  the 
covariances  are  taken  across  the  specification,  from  one 
realization  to  another,  WITH  AN  ENTTRE  NEW  SAMPLE  OF  ROWS 
AND  COLUMNS  IN  EACH  REALIZATION! 


cov  {yijk'  w 


2  2  2  2 
°  +  aRC  +  aR  +  aC' 


ljk’ 


2  2  2 
aRC  +  °R  +  °C’ 


cov  ty 
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COV  Wjk'  yiJK^  aR> 

C0V  ^ yi  jk’  yIjK^  =  aC' 

These  covariances  across  the  ensemble  are  quite  analogous 
to  the  serial  covariances  in  the  time  series  case,  which 
are  given  by 

R(h)  =  cov  1 y( t) ,  y( t+h )} 

where  the  covariance  is  again  across  the  ensemble,  from 
realization  to  realization,  and  whose  relation  ‘to  the 
frequency  variance  components  is,  formally, 

R(h)  =  Oq  +  Ej(  cos  jh)  •  ay 

The  main  reason  for  approaching  the  analogy  in  terms  of 
averages  of  squared  differences  is  a  pedagogical  one.  it 
seems  to  be  easier  to  think  about  the  averages  of  squared 
differences,  when  working  from  one  realization  to  another. 
After  all,  as  statisticians  we  are  quite  used  to  thinking 
about  the  average  value  of  some  quantity  we  have  managed 
to  measure  only  once.  But  it  is  a  much  further  cry  to 
think  about  a  covariance  of  two  quantities,  each  of  which 
has  been  measured  only  once. 
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The  qualitative  nature  of  this  distinction  be¬ 
tween  covariances  and  averages  squared  differences  is 
notably  different  for  the  replicated  double  classification 
and  for  stationary  ensembles  of  periodic  time  series.  This 
is  due,  in  large  part,  to  our  tendency  to  expect  the  versions 
of  classifications  to  have  names,  to  try  to  think  in  terms 
of  situations  where  means  and  main  effects  are  more  important 
than  variance  components.  We  feel  that  if,  for  example,  i 
is  a  subscript  identifying  persons,  that  i  =  3  should  refer 
to  a  particular  person,  not  to  the  third  row  of  some  randomly 
arranged  data  array. 

Yet  in  a  situation  where  a  pure  variance  component 
approach  is  appropriate,  the  process  of  randomly  rearranging 
the  rows  of  the  data  array  generates  what  we  may  think  of, 
without  doing  too  much  violence  to  the  situation,  as  a  new 
(but  clearly  not  independent)  repetition  of  the  experiment. 

If  we  fix  our  eyes  on  particular  values  of  i,  j,  k,  I,  J, 
and  K,  consider  all  admissible  rearrangements  of  the  data 
array,  and  then  average  the  simplest  quadratic  expressions, 
we  are  led  to  suitable  symmetric  functions  of  the  original 
data  array  which  are  natural  estimates  of  the  covariances 
across  the  ensemble,  provided  the  latter  are  given  an 
averaged  interpretation. 
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The  usual  practice  in  the  spectrum  analysis  of  a 
single  stretch  of  time  series  Is  entirely  analogous  to  such 
a  procedure.  Let  us,  for  example,  consider  estimating 
cov(y1,  y^)  .  We  have  the  original  observations  y-j^  ,y2  ,y^ 

-  .  The  results  of  shifting  the  time  origin,  one  unit  at  a 

time,  and  always  dropping  observations  at  negative  times,  are 

first  y2,y3,y^,y5,yg,  —  ,  then  - and  so  on. 

The  pairs  (y1,yi|),  (y2,y^) ,  (y^yg),  —  (yfc,yt+3)  are 
"equivalent"  (either  because  statlonarity  is  assumed  or  because 
we  want  an  averaged  covariance)  and  we  can  calculate  a  "sample" 
covariance  from  these  pairs.  Such  processes  of  imitating  the 
sought-for  covariance  across  the  ensemble  with  a  sample 
"covariance"  wandering  around  the  data  pattern  are  inevitable 
when  only  a  single  realization  is  available,  be  it  in  an 
analysis-of-variance  situation  or  a  time  series  situation. 

(in  the  time  series  situation,  if  and  when  we  look 
more  deeply  into  the  details  of  the  situation,  we  may  find 
that  the  averages  of  squares  of  differences  indeed,  as  Jowett 
has  suggested  [1955j  1957,  1958],  have  real  advantages  over 
covariances,  insofar  as  problems  associated  with  trends  and 
very  low  frequencies  are  concerned.  But  this  is  for  the 
future  to  reveal.) 
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Black  boxes  and  the  general  case 

A  discussion  exactly  analogous  to  the  one  just 
given  for  stationary  ensembles  of  period-2TT  time  series 
can  be  given  for  the  general  case  of  a  stationary  ensemble 
of  time  series.  We  shall  not  attempt  to  give  details  here, 
trying  only  to  hit  the  high  points. 

There  are  many  circumstances  under  which  it 
is  convenient  to  call  any  procedure  or  process  (be  it 
computational,  physical,  or  conceptual)  which  converts  an 
input  to  an  output  a  black  box.  In  dealing  with  time  series 
it  is  convenient  to  restrict  the  term  black  box  to  procedures 
or  processes  which  satisfy  two  further  conditions: 

(1)  The  output  corresponding  to  the  superposition  of 
two  inputs  is  the  superposition  of  the  corresponding  out¬ 
puts  . 

(2)  The  only  effect  of  delaying  an  input  by  a  fixed 
time  is  to  delay  the  output  by  the  same  time. 

If  the  procedure  or  process  departs  from  one  or  both  it 
is  conveniently  called  a  colored  box,  with  specific  colors 
for  specific  sorts  of  departure. 

Some  examples  of  black  boxes  include*. 

(a)  moving  averages,  such  as 

Zt  =  H  lyt-k+l  +  yt-h+2  +  +  yt} 
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(b)  time  delays 

zt  “  yt-h 

(c)  differences 

Zt  =  yt  -  yt-h 

(d)  more  general  moving  linear  combinations 

zt  =  aoyt  +  aiyt-i  +  •••  +  ahyt-h 

(e)  linear  electric  networks  (which  may 
include  amplifiers,  transmission  lines,  and  wave  guides), 

(f)  linear  mechanical  systems, 

(g)  linear  economic  systems, 

(h)  differentiation  with  respect  to  time, 

(i)  integration  with  respect  to  time. 

Clearly  many  of  the  most  important  computational,  physical, 
and  conceptual  processes  are  black  boxes  in  this  sense. 

It  is  easy  to  show  (if  we  grant  a  small  amount 
of  continuity  and  a  sufficient  lack  of  dependence  of 
present  output  on  what  happened  at  t  =  -oo)  that,  if  the 
input  to  a  black  box  is  A  •  cos(  cot+6)  ,  then  the  output  has 
to  take  the  form  G(co)  •  A*  cos(oyt+5+cp(a>)  )  ,  where  the  amplifi¬ 
cation  G(od)  and  the  phase  shift  cp(cu)  depend  only  upon  o>. 
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This  brings  every  black  box  into  the  framework  discussed 
by  Jenkins,  so  that 

(spectrum  of  output)  =  [G(o>)]  .  (spectrum  of  Input). 

The  important  thing  about  this  relation,  for  our  present 
purposes,  is  that  the  variance  component  associated  with  a 
single  frequency  (or  narrow  band  of  frequencies)  in  the  out¬ 
put  is  determined  by  the  corresponding  variance  component 
of  the  input.  There  is  no  mixing  up  of  frequency  variance 
components .  This  is  s imultaneous ly  true  for  all  black 
boxes,  and  is  the  basic  reason  why  the  user,  be  he  physicist, 
economist,  or  epidemiologist,  almost  invariably  finds 
frequency  variance  components  the  most  satisfactory  choice 
for  any  time  series  problem  which  should  be  treated  in  terms 
of  variance  components. 


II 

OTHER  ANALOGIES 

I  hope  that  Part  I  has  made  the  close  relationship 
between  spectrum  analysis  of  a  single  time  series  and  vari¬ 
ance  component  analysis  very  much  clearer.  There  are  similar 
analogies  to  other  classical  techniques.  These  are  worthy  of 
mention  here,  even  though  we  cannot  take  the  space  to  describe 


them  in  detail. 
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Even  though  the  cross-spectrum  analysis  of  two 
or  more  time  series  was  not  discussed  in  this  session  (in 
part  because  an  understanding  of  the  spectrum  analysis  of 
one  time  series  is  an  essential  prerequisite),  it  is  im¬ 
portant  to  point  out  that  probably  the  most  important 
aspects  of  cross-spectrum  analysis  are  cases  of  (complex¬ 
valued,  frequency-dependent)  regression  analysis  in  which 
the  analog  of  a  regression  coefficient  is  the  ratio  of  a 
(complex-valued)  cross-spectrum  density  to  a  spectrum 
density,  and  is  estimated  by  the  corresponding  ratio  of 
estimates  of  averaged  densities.  (This  fact  will  not 
surprise  those  who  recall  that  a  simple  regression  coefficient 
is  estimated  as  the  ratio  of  a  sample  covariance  to  a  sample 
variance,  or  that  a  structural  regression  coefficient  is 
sometimes  estimated  as  the  ratio  of  a  sample  covariance 
component  to  a  sample  variance  component.)  In  studying 
time  series,  as  in  its  more  classical  situations,  regression 
analysis,  whenever  there  is  a  suitable  regression  variable, 
is  a  more  sensitive  and  powerful  form  of  analysis  than 
variance  component  analysis.  As  a  consequence,  one  major 
reason  for  learning  about  spectrum  analysis  is  as  a 
foundation  for  learning  about  cross-spectrum  analysis. 
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The  other  approaches  to  data  associated,  directly 
or  indirectly,  with  the  analysis  of  variance  and  the  name 
of  R.  A.  Fisher  also  have  their  analogs  in  the  analysis 
of  time  series.  We  have  already  noted,  for  example,  how 
classical  harmonic  analysis  is  the  appropriate  approach 
to  known-period  time  functions  when  the  over-all  situation 
is  such  that  one  should  look  at  means  rather  than  at 
variances . 

In  dealing  with  the  mean-like  behavior  of  non¬ 
periodic  time  functions  from  a  Fourier  point  of  view,  a 
natural  and  effective  approach  is  furnished  by  complex 
demodulation  in  which  the  given  stretch  of  data  {X.}  is 
first  converted  into  two  stretches  of  (real)  values,  viz. 


cos 


and 


{x.  sin  0)  t} 
j  o 


which  can  usefully  be  regarded  as  the  real  and  (+  or  -) 
imaginary  parts  of  one  or  the  other  of  the  complex  stretches 
of  data 


io)  tn 

0 } 


or 


.  -ici)  t 
{Xje  «* 


i 


The  second  step  is  to  smooth  the  two  real-valued  stretches, 
smoothing  both  in  the  same  way.  The  simplest  smoothing 
process  is  the  formation  of  equally-weighted  "moving  averages. 
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but  it  is  often  desirable  to  use  weights  which  taper 
down  at  each  end  appropriately.  The  final  step  is  to 
display  the  result  in  various  ways,  including: 

( 1)  Plotting  individual  stretches  of  smoothed 
values  against  time. 

(2)  Plotting  corresponding  smoothed  values 
against  one  another,  using  time  as  a  parameter. 

( 3)  Plotting  against  time  the  phase  or  the 
magnitude  of  the  complex  number  whose  real  and  imaginary 
parts  are  the  corresponding  smoothed  values. 

The  interpretation  of  such  plots  is  usually 
guided  by  an  understanding  of  what  happens  if  a  particular 
single  frequency  or  band  of  frequencies  are  prominent  in 
the  original  data.  If  the  original  data  were  simply 
X j  =  A  cos  (oyt+cp),  then  the  values  of  the  two  modulation- 
product  stretches  would  be 


X  .  cos  CD  t 
J  o 


A  cos 


(cD-CDo)t  +  cp 


+  g-  A  cos 


(cO+CDo)t  + 


X  .  sin  cd  t 
J  O 


sin 


(  CD— 0>o  )  t  +  cp 


sin 


(  CD+CDq  )  t  + 


25 


and  the  result  of  smoothing  these  would  be 
eliminate  both  terms  if  go  was  not  near  coq, 
eliminate  the  terms  in  (co+cDo)t  +  cp  if  cd  is 
results  of  smoothing,  then,  would,  if  a>  is 
close  to 


to  nearly 

and  to  nearly 

near  co  .  The 
o 

near  cd  ,  be 
o 


\  A  •  G(  cd-cdo  ) 


cos 


( u>-o>  )  t  + 
v  o ' 


9 


and 


A  •  G(  cd-co  ) 
v  o' 


sin 


( co— co  )  t  + 
v  o ' 


where  G(co-coo)  is  the  magnitude  of  the  transfer  function 
of  the  smoothing  process  (which  we  have  assumed  to  use 
symmetrical  weights  and  thus  not  to  affect  phase).  In 
this  simple  case,  a  cosinusoidal  variation  of  angular 
frequency  co  in  the  original,  which  may  have  been  quite 
effectively  concealed  by  larger  contributions  at  other 
frequencies,  has  been  demodulated ,  and  appears  as  a 
cosinusoidal  variation  at  the  very  much  reduced  angular 
frequency  co-coQ,  which  is  likely  to  be  much  more  evident 
to  the  eye.  ( Complex  demodulation,  the  calculation  and 
smoothing  of  two  stretches  of  modulation-products,  is 
necessary  if  we  are  to  distinguish  the  results  of  de¬ 
modulating  cos(coo+5)t  from  the  results  of  demodulating 
cos  (  cdq  -  6)  t .  ) 
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This  technique  is  the  natural  extension  to  the 
nonperiodic  case  of  the  ideas  underlying  the  classical 
Buys-Ballot  table  [Stumpff  1937,  PP.  132ff,  or  Burkhardt 
1904,  pp.  678-679],  the  so-called  secondary  analysis,  and 
Bartels's  summation  dial  [Chapman  and  Bartels  1940, 

PP.  593-599  or  Bartels  1935,  PP.  30-31].  It  has  to  be 
tried  out  on  actual  data  before  its  incisiveness  and  power 
is  adequately  appreciated. 

Problems  involving  the  simultaneous  behavior 
of  more  than  two  time  series  have  not  been  worked  on 
in  a  wide  variety  of  fields  of  application,  but  enough 
has  been  done  to  point  the  way  and  suggest  the  possibilities. 
There  will  be  an  increasing  number  of  instances  where- the 
corresponding  nontime-series  problems  would  be  naturally 
approached  by  multiple  regression.  These  can  be  effec¬ 
tively  approached  by  multiple  cross-spectrum  and  spectrum 
techniques  which  will  be  precise  analogs  of  multiple 
regression  in  spirit  and,  if  care  is  taken  in  choice,  in 
the  algebraic  form  of  their  basic  equations.  The  dif¬ 
ferences  which  will  arise  in  the  development  will  stem 
from: 

(1)  the  fact  that  regression  goes  on  separately 
at  each  frequency  (which  produces  merely  an  extensive 
parallelism  of  results),  and 
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(2)  the  fact  that  regression  coefficients  will 
now  take  complex  values  rather  than  real  values  (which 
enable  us  to  learn  a  little  bit  more  about  the  underlying 
situation) . 

To  my  knowledge  the  multiple-time-series  analogs 
of  discriminant  functions  and  canonical  variates  have  not 
yet  arisen  in  practice.  But  there  would  seem  to  be  no 5 
difficulty  in  analogizing  either  or  both. 

Ill 

PARSIMONY  AND  ERROR  TERMS 


Parsimony 

It  appears  to  be  natural  to  try  to  set  up  statisti¬ 
cal  problems  in  such  a  way  that  the  numerical  values  of  only 
a  few  characteristics,  each  easily  estimated  from  the  obser¬ 
vations,  suffice  to  complete  the  fixing  of  a  probability  model 
for  the  situation.  And  it  appears  all  too  natural  to  feel 
that  such  presuppositions  as  normality  or  constancy  of  vari¬ 
ance  are  important,  since,  if  they  failed  to  hold,  the  whole 
situation  would  not  be  completely  fixed  by  the  values  of  those 
characteristics  which  are  easily  estimated.  But,  for  all  such 
naturalness,  the  working  statistician  knows  that  it  is  often 
useful  to  estimate  the  mean  of  a  population  whose  variance  is 
unknown,  and,  similarly,  that  it  is  often  useful  to  estimate 
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the  variance  of  a  population  that  is  non-normal  (frequently 
without  trying  to  assess  the  nature  and  amount  of  its  non¬ 
normality)  .  For  characteristics  to  be  usefully  estimated, 
it  is  not  necessary  that  their  values  complete  a  precisely 
stated  model. 

It  is  frequently  the  case,  that  results  about 
designing  an  experiment  are  only  precise  when  the  character¬ 
istics  to  be  estimated  complete  a  precisely  stated  model. 

Thus  the  famous  telephone  query,  "I'm  going  to  do  an  experi¬ 
ment,  how  many  sheep  should  I  use?"  cannot  be  answered  when 
all  else  that  is  known  is  that  the  experimenter  wants  to 
compare  the  means  of  two  treatments  to  a  precision  of  ±1.5 
pounds  of  body  weight,  or  that  he  wants  to  assess  a  simple 
variance  of  ±10#  of  Itself.  In  the  first  of  these  Instances, 
precise  design  would  require  a  precise  variance  of  observa¬ 
tion.  In  the  second,  precise  design  would  require  precise 
knowledge  of  distributional  shape.  Yet  experiments  can  be, 
and  are,  wisely,  if  not  optimally,  designed  and  validly 
analyzed  in  the  absence  of  such  precise  information. 

Insofar  as  normality  is  needed  only  (i)  to  ensure 
that  knowledge  of  the  spectrum  would  leave  nothing  else  to 
learn,  or  (ll)  to  ensure  that  pre -experimental  assessments 
of  variability  are  precise,  and  these  are  the  only  reasons 
why  Jenkins  is  concerned  with  normality,  normality  is  not  of 
great  practical  Importance  in  spectrum  analysis. 
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(it  is  fortunate  that  normality  is  moderately 
closely  approximated  to  in  certain  applications,  since  there 
are  further  branches  of  time  series  analysis,  for  example 
those  dealing  with  numbers  of  upcrosses  or  numbers  of  maxima, 
for  which  normality  is  of  crucial  Importance.  Sequences  of 
zeroes  and  ones  represent  one  ultimate  expression  of  non¬ 
normality.  In  some  instance,  such  sequences  are  usefully 
studied  by  spectrum  analysis,  in  others  they  are  not.  The 
difference  has  to  do  with  which  aspects  of  their  behavior 
is  important.) 

Indeed  there  is  a  very  general  principle  of  data 
analysis  upon  which  all  examiners  of  main  effects  (in  analy¬ 
ses  of  variance)  lean,  whether  they  know  it  or  not.  This 
can  be  boldly  stated  as  the  Principle  of  Parsimony,  viz., 

IT  MAY  PAY  NOT  TO  TRY  TO  DESCRIBE  IN  THE  ANALYSIS  THE  COM¬ 
PLEXITIES  THAT  ARE  REALLY  PRESENT  IN  THE  SITUATION.  Every 
time  that  one  pays  attention  to  main  effects  alone,  whether 
because  they  are  so  much  larger  than  interactions,  or  because 
the  interactions  .cannot  be  estimated  with  sufficient  pre¬ 
cision,  or  for  almost  any  other  reason,  one  is  behaving  in 
accord  with  this  principle.  Thus  this  principle  is  widely, 
though  usually  implicitly,  adopted.  The  same  principle  applies 
to  the  quadratic  analysis  of  time  series,  to  spectrum  analysis 
and  its  relatives,  not  just  in  a  single  way,  but  in  some  three 
or  four  separate  and  distinct  ways  s- 
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Normality 

The  first  application  is  to  the  need,  or  lack  of 
need,  for  estimation  to  a  complete  specification,  for  either 
assuming  normality  or  estimating  more  complex  matters  than 
the  spectrum.  In  most  practical  situations  this  need  is 
non-existent.  Knowledge  about  the  spectrum  of  a  probably 
non-normal  ensemble  of  time -functions  can  be  useful,  just  as 
knowledge  about  the  mean  of  a  population  of  imprecisely  known 
variance  can  be  useful.  (in  either  case,  once  the  data  has 
been  gathered,  consistency  of  repetition  is  the  appropriate 
basis  for  judging  the  stability  of  the  result,  not  assumptions 
about  normality  or  known  variance . ) 

Statlonarlty 

The  second  application  of  the  general  principle  is 
to  the  assumption  of  statlonarlty,  the  analog  in  time  series 
situations  to  the  assumption  of  constancy  of  variance  in  more 
classical  situations.  The  assumption  of  statlonarlty  is  one 
at  which  the  innocent  boggle,  sometimes  even  to  the  extent  of 
failing  to  learn  what  the  data  would  tell  them  if  asked.  Yet 
I  have  yet  to  meet  anyone  experienced  in  the  analysis  of  time 
series  data  (Gwilym  Jenkins  is  an  outstanding  example)  who  is 
over-concerned  with  statlonarlty.  All  of  us  give  some  thought 
to  both  possible  and  likely  deviations  from  statlonarlty  in 
planning  how  to  collect  or  work  up  data,  but  no  one  of  us  will 
allow  the  possibility  of  non- statlonarlty  to  keep  us  from 
making  estimates  of  an  average  spectrum,  any  more  than  working 
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analysis-of-variance  statisticians  will  refrain  from  esti¬ 
mating  a  variance  component  because  the  variability  thus 
assessed  may  well  have  to  be  an  average. 

The  fact  that  the  spectrum  is  changing  with  time 
(or  elevation,  or  azimuth)  need  not  make  it  unwise  to  estimate 
one,  or  several,  average  spectra.  The  detection  of  waves  1 
millimeter  high,  1  kilometer  long,  with  a  10,000  kilometer 
fetch  [Munk  and  Snodgrass  1957]  was  based  upon  estimates  of 
spectra  averaged  over  four-hour  periods  The  crucial  point 
in  identifying  the  length  of  the  fetch  was  the  rate  of  change 
of  the  center  frequency  of  this  distinctive,  but  very  small 
peak,  from  one  four-hour  period  to  another.  Once  we  admit 
that  we  are  estimating  an  average  spectrum,  we  have  admitted 
that  there  may  well  be  other  relevant  characteristics  of  the 
situation  beyond  the  spectrum,  that  estimation  is  not  com¬ 
pleting  specification.  Such  an  admission,  as  this  example 
shows,  is  a  good  thing  rather  than  a  bad  one. 

There  seems  to  be  extra  reluctance  to  consider  an 
average  spectrum.  It  is  hard  to  be  sure  of  the  principal 
reasons  for  this,  but  a  well-founded  desire  for  replication 
as  a  basis  of  security  is  likely  to  be  one.  If  only  one  time 
series  is  available  for  analysis,  as  is  far  too  often  the 
case  in  so  many  economic  instances,  it  Is  comforting  to  be¬ 
lieve  that,  somehow,  stationarity  makes  it  possible  to  have 
"replication"  from  one  time  period  of  another.  The  truth 
is  not  so  comforting.  Stationarity  is  frequently  absent. 

Even  when  stationarity  holds,  something  like  "replication" 
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can  only  occur  within  the  limits  of  a  single  stretch  of 
moderate  length  if  the  true  spectrum  is  devoid  of  detailed 
features  (is  sufficiently  smooth  in  the  small).  And  it  is 
surely  not  wise  to  trust  in  "replication"  that  may  not  be 
there . 

Harry  Press  notes  (private  communication)  that 
average  spectra  may  hide  an  important  departure  from  station- 
arity.  In  an  entirely  similar  way,  the  use  of  analysis  of 
variance  on  the  results  of  an  experiment  comparing  12  treat¬ 
ments  in  randomized  blocks  may  hide  a  substantial  dependence 
of  variability  upon  treatment,  or  a  substantial  dependence 
of  treatment  effect  upon  block.  These  things  can,  and  do 
happen.  The  possibility  of  their  occurrence  must  be  carefully 
kept  in  mind.  But  this  fact  is  not  relevant  to  the  point  we 
have  just  been  discussing. 

Surely,  if  one  has  both  adequate  data  and  scientific 
or  insightful  ground  to  fear  non-stationarity ,  it  will  be  wise 
not  to  average  spectra  over  too  long  a  time.  But  the  urge  to 
choose  the  averaging  time  wisely  is  strengthened  by  an  under¬ 
standing  that  all  data  analyses  estimate  average  spectra. 
Wisely-chosen  resolution 

The  third  application  of  the  general  principle  is 
to  the  question  of  the  narrowness  of  the  frequency  ranges 
for  which  we  should  seek  spectrum  estimates.  There  are  in¬ 
finitely  many  frequencies.  The  number  of  separate  frequencies 
over  which  we  could  seek  estimates  from  a  given  body  of  data 
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is  limited  by  the  extent  of  the  data,  and  grows  without  limit 
as  longer  and  longer  pieces  of  data  become  available.  But 
it  does  not  follow  that  we  should  always,  or  even  usually, 
work  close  to  this  limit.  The  analogy  with  an  interaction 
mean  square  in  a  row-by-column  table  is  close  and  persuasive. 
There  are  r-c  individual  estimates  of  the  interaction  mean 
square,  each  based  on  just  one  of  the  residuals  which  remain 
after  fitting  rows  and  columns,  each  involving  just  one  degree 
of  freedom.  How  often  does  it  pay  us  to  calculate  and  compare 
all  these  separate  estimates?  Only  very  rarely.  (it  is  often 
useful  to  calculate  and  compare  a  few  estimates  of  an  inter¬ 
action  mean  square,  each  based  on  a  reasonable  portion  of 
the  available  degrees  of  freedom.)  The  position  with  spectrum 
estimates  is  analogous  and  similar;  to  be  effective  we  must 
estimate  averages  over  well-selected  frequency  ranges.  (This 
is  in  addition  to  the  averaging  over  time  necessitated  by 
lack  of  perfect  statlonarity . )  In  both  instances,  interaction 
mean  square  and  spectral  estimate,  it  does  not  pay  to  try  to 
estimate  too  much  detail,  even  if  the  detail  is  really  there. 
Proper  error  terms 

The  question  of  the  proper  error  term  is  a  classic 
of  the  analysis  of  variance,  often  relied  upon  to  separate 
the  men  from  the  boys  and  the  pastry  cooks.  It  is  well 
recognized  that,  for  example,  the  plot-to-plot  error  of  an 
agricultural  experiment  is  almost  certain  to  be  too  small, 
specifically  because  it  rules  out  place-to-place  and 
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year-to-year  components  of  variation.  It  is  not  too  great 
a  stretch  to  consider  this  question,  which  arises  for  time 
series  in  an  only  slightly  different  form,  a  fourth  example 
of  the  general  principle  of  parsimony.  For  while  it  will 
not  be  costly  to  estimate  plot-to-plot  variance,  it  is  likely 
to  be  costly  to  trust  it,  to  use  such  estimates  as  error 
estimates.  Even  its  estimation  may  be  costly,  in  the  agri¬ 
cultural  situation,  if  the  result  is  to  expend  too  much 
effort  on  choosing  the  optimum  plot  size,  on  doing  one's 
best  to  reduce  what  may  be  a  minor  source  of  variation. 

As  Jenkins  points  out  at  the  very  end  of  his  paper,  it  is 
not  uncommon  for  spectrum  estimates  based  upon  different 
experimental  repetitions  to  differ  more  than  might  be 
expected  from  their  internal  behavior.  (Statisticians 
familiar  with  any  of  a  wide  variety  of  other  situations 
would  be  surprised  if  this  were  not  so,  if  external  error 
were  not  larger  than  internal  error.)  As  a  consequence, 
it  is  not  likely  to  be  worth  while  to  expend  too  much  effort 
in  using  estimates  whose  windows  have  optimum  widths  and 
optimum  detailed  shapes,  since  this  may  mean  exerting  a 
large  effort  to  minimize  a  minor  component  of  variability. 

One  way  to  describe  matters  is  in  terms  of  alter¬ 
native  ensembles.  In  each  repetition  of  the  experiment, 
the  time  series  which  is  actually  realized  is  drawn  from 
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a  different  ensemble  (from  a  different  population  each 
element  of  which  is  a  whole  time  series)  .  Such  a  de¬ 
scription  is  entirely  analogous  to  a  description  of  an 
agricultural  experiment  in  which  each  local  comparison 
of  two  treatments  is  drawn  from  a  population,  but  the 
populations  for  different  Mplaces"  or  "years"  differ. 

The  fact  that  matters  may  be  appropriately  described  in 
such  a  way  often  affects  what  we  wish  to  estimate.  If 
an  average  comparison,  in  the  agricultural  situation, 
depends  upon  the  "place"  in  a  way,  or  for  reasons,  that 
we  do  not  understand,  we  are  usually  driven  to  estimate, 
not  average  responses  at  Individual  places,  but  rather 
average  responses  for  all  places.  (These  are  the  natural 
"main  effects".)  There  are  situations,  however,  as  for 
example  when  studying  a  cheaper  substitute  to  see  if  it 
causes  occasional  deleterious  effects,  where  we  may  need, 
because  of  variation  from  place  to  place,  to  estimate  the 
value  of  the  least  favorable  average  response  and,  perhaps, 
the  frequency  with  which  similarly  unfavorable  situations 
will  arise  in  more  extended  practice.  The  situation  with 
time  series  is  exactly  similar. 

Most  of  the  time  we  shall  be  driven  to  estimation 
of  a  spectrum  averaged  over  repetitions,  where  the  pattern, 
or  the  causes,  of  the  changes  in  spectrum  from  repetition 
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to  repetition  are  not  understood.  This  averaging  over 
repetitions,  forced  on  us  by  alternate  ensembles,  is  super¬ 
posed  upon  the  averaging  over  time  within  repetition,  par¬ 
tially  forced  upon  us  by  non-stationarity ,  and  upon  the 
averaging  over  frequency  bands,  forced  upon  us  by  the 
limited  extent  and  amount  of  our  data.  What  we  estimate, 
then,  is  an  average  of  averages  of  averages.  We  have  come 
a  long  way  from  the  idea  of  a  tight  specification-estimation 
relationship,  where  everything  which  is  not  presupposed 
should  be  estimated.  But  it  is  well  that  we  have  done  so. 
And  no  one  who  has  considered  carefully  what  is  estimated 
by  a  main  effect  in  a  reasonably  complex  analysis  of  vari¬ 
ance  can  maintain  that  so  much  averaging  is  surprising  or 
unusual . 

Just  as  in  more  conventional  areas  of  statistical 
application,  there  are  situations,  the  comparison  of  vibra¬ 
tion  intensity  with  structural  strength  being  perhaps  the 
most  obvious  ,  where  we  shall  need  to  estimate  not  the 
average  spectrum  but  some  upper  limit,  perhaps  an  upper  99$ 
limit,  for  the  spectra  in  the  various  replications,  for  the 
spectra  of  the  various  alternative  ensembles.  But  such 
instances  are  the  exception,  not  the  rule. 
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Effects  upon  balance  between  stability  and  resolution 

In  any  case,  the  presence  of  true  differences 
between  repetitions,  of  differences  between  the  spectra  of 
the  alternative  ensembles,  will  surely  force  a  readjustment 
of  the  balance  between  stability  and  resolution.  The  main 
reason  for  estimating  average  spectral  densities  over  rela¬ 
tively  broad  frequency  bands  is  to  assure  moderate  stability 
of  estimate.  If  variation  within  ensembles  should  be  small 
compared  to  variation  between  ensembles,  such  within-ensemble 
stability  is  of  little  value  to  us.  Thus  we  can  afford,  in 
such  circumstances,  to  improve  our  frequency  resolution  by 
estimating  spectral  densities  averaged  over  narrower  bands. 
(There  will  still  remain  a  natural  limitation  on  resolution, 
however,  associated  with  the  limited  duration  of  the  individual 
ensembles . ) 


IV 


SPECIAL  PROBLEMS  OP  TIME  SERIES 


Resolution 

The  notion  of  resolution,  as  applied  in  optics  and 
other  branches  of  physics,  is  a  well-recognized  and  useful 
physical  concept.  It  does  not  have  any  single  definition  in 
numerical  terms,  and  it  is  well  that  it  does  not.  For  the 
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general  Idea  that  "higher  resolution"  means  "capable  of 
detecting  more  detail"  is  clear,  while  any  one  way  of  making 
It  quantitative  would  not  be  universally  satisfactory .  (if 
you  like,  "resolution"  Is  not  "unidimensional".  But  whether 
you  like  this  fact  or  not,  it  would  be  unwise  to  make  it 
unidimensional  by  a  fiat  of  definition.)  Jenkins  and  Parzen 
have  introduced  us  to  a  number  of  definitions  of  bandwidth. 
There  are,  and  will  be,  other  such  definitions.  The  value 
of  any  of  them  lies  in  what  the  values  of  the  variously 
defined  bandwidths  tell  us  about  "resolution".  No  one  defi¬ 
nition,  nor  even  all  the  definitions  so  far  given,  can  tell 
us  all  about  resolution.  As  Goodman  pointed  out  in  his 
verbal  discussion,  such  matters  as  "rejection  slope  in 
db/octave  away  from  the  major  lobe"  or  "db  of  rejection  at 
a  particular  frequency"  can  be  important  in  particular  cir¬ 
cumstances.  Thus  numerical  values  of  bandwidths  according 
to  any  definition  closely  related  to  "resolution"  can  help 
us,  but  they  will  help  us  most  if  we  regard  them  as  telling 
us  part,  not  all,  of  the  story. 

Choice  of  resolution 

There  is  one  matter  upon  which  I  should  not  like 
to  have  my  views  misunderstood:  the  desirability  in  explora¬ 
tory  work  of  making  spectral  analyses  of  the  same  data  with 
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different  resolutions  (usually  represented  in  packaged  systems 
of  calculation  of  spectrum  analysis  by  the  use  of  varying  num¬ 
bers  of  lags  in  the  Initial  computing  step,  which  is  the  cal¬ 
culation  of  sums  of  lagged  products) .  Let  me  be  quite  clear 
that,  in  my  judgment  and  according  to  my  experience,  it 
definitely  is  very  often  desirable  in  exploratory  work ,  and 
sometimes  essential,  to  make  analyses  of  the  same  data  at 
differing  resolutions.  Moreover,  it  may  be  equally  important 
to  use  different  window  shapes  and  different  prewhitenings . 

The  place  where  Jenkins  and  I  differ  seriously,  at 
least  verbally  (and  I  suspect  the  difference  is  more  verbal 
than  actual)  is  in  the  utility  of  examining  some  sequence  of 
mean  lagged  products  as  a  firm  basis  for  choosing  the  number 
of  such  values  to  be  inserted  in  an  appropriate  Fourier  trans¬ 
former,  and  transformed  into  spectral  estimates.  Our  differ¬ 
ence  is  greater  still  in  connection  with  the  adequacy  of  the 
point  of  apparent  "damping  down"  of  these  values  as  a  basis 
for  choosing  this  number.  It  is  not  that  knowledge  of  the 
"damping  down"  lag  is  not  useful,  but  rather  that,  at  least 
in  my  view,  its  unthinking  use  may  be  dangerous. 

On  the  one  hand,  I  have  known  of  cases  where  the 
useful  estimates  of  power  spectra  came  from  stopping  well 
short  of  the  damping-down  point.  On  the  other  hand,  if  the 
spectrum  were  to  contain  one  very  large,  very  broad,  very 
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smooth  peak,  and  a  close  group  of  small,  narrow  peaks,  the 
mean  lagged  products  would  appear  to  damp  down  at  a  lag 
associated  with  the  width  of  the  large  broad  peak,  so  that  a 
spectrum  whose  resolution  was  associated  with  this  damping- 
down  point  would  fail  to  resolve  the  close  group  of  small 
peaks.  Here,  as  in  all  sorts  of  data  analysis,  there  is  no 
substitute  for  careful  thought  combined  with  trial  of  various 
alternatives . 

It  is  natural  to  be  tempted  into  calculating  more 
spectrum  estimates  than  the  number  of  mean  lagged  products 
used  as  their  basis.  This  temptation  need  not  be  a  dangerous 
one,  once  it  is  realized  that,  given  the  mean  lagged  products 
and  the  shape  of  the  window,  all  the  possible  spectrum  estimates 
lie  on  a  cosine  polynomial  of  degree  equal  to  the  number  of 
lags  used.  Once  the  usual  number  of  spectrum  estimates  have 
been  calculated,  they  are  enough  to  determine  this  polynomial, 
and  the  calculation  of  further  estimates  is  equivalent  to  a 
process  of  cosine -polynomial  interpolation.  This  does  not 
mean  that  calculating  more  estimates  is  useless,  or  that  the 
results  of  further  calculation  will  lie  close  to  the  results 
of  straight-line  interpolation  between  the  poins  already  cal¬ 
culated.  But  it  does  mean  that  the  additional  estimates 
provide  no  new  information,  only  more  detailed  exposition  of 
information  already  present.  And  it  means  that  drawing  smooth 
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freehand  curves  through  the  original  spectral  estimates  Is 
often  much  more  useful  than  connecting  them  by  segments  of  ■ 
straight  lines. 

Blurred  estlmands 

In  discussing  the  general  principle  of  parsimony 
we  emphasized  the  need  to  estimate  averages  over  bands  of 
frequencies.  This  point  is  so  central  to  spectrum  analysis 
as  to  make  its  heuristic  and  intuitive  understanding  worth 
considerable  effort.  Let  us  begin  with  classical  situations. 
If  one  has  more  degrees  of  freedom  than  variance  components, 
then  one  can  find  estimates  of  some  (and  perhaps  all)  of  these 
variance  components  whose  average  values  do  not  depend  upon 
the  other  variance  components.  But  once  there  are  more  vari¬ 
ance  components  than  degrees  of  freedom,  this  need  not  be 
the  case.  Consider  a  two-way  r-by-c  array  of  observations 
in  which  there  are  r*c+2  variance  components,  viz.  a  rows 
variance  component,  a  columns  variance  component,  and  one 
variance  component  for  each  of  the  r.c  cells.  (This  is  a 
natural  model  when  the  variance  of  the  cell  contributions 
varies  irregularly  from  cell  to  cell.)  In  this  situation 
there  is  no  estimate  of  any  of  the  r*c  cell  variance  com¬ 
ponents  whose  average  value  is  free  of  all  the  other  variance 
components . 
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In  the  time  series  case  there  are  very  many  more 
variance  components  than  degrees  of  freedom.  For,  unless 
some  periodicity  assumption  holds  perfectly  (and  I  know  of 
not  a  single  instance  where  it  does) ,  a  contribution  of  the 
form 


A  cos  cut  +  B  sin  cut 

is  permissible  for  any  value  of  cu  in  some  interval.  And  as 
statisticians  know  from  bitter  experience,  at  least  all  the 
things  that  are  permissible  will  happen.  Thus,  in  principle, 
there  are  infinitely  many  variance  components,  one  for  each 
possible  cd=  And,  when  the  realities  of  band-limiting  and  of 
finite  duration  of  data  are  faced,  there  are  only  a  finite 
number  of  observations  available,  and  hence  only  a  finite 
number  of  degrees  of  freedom.  There  is  no  hope  of  estimating 
all  variance  components  here,  even  by  using  impractically 
unstable  estimates. 

Bracketing  undeslred  effects 

Let  us  return,  for  the  moment,  to  a  situation  with 
a  finite  number  of  variance  components,  only  four  of  which 
will  enter  our  discussion.  Let  us  suppose  that  we  are  in¬ 
terested  in  estimating  a  particular  one  of  these  variance 
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O 

components,  ,  and  that  our  choice  has  narrowed  down  to 
three  quadratic  functions  of  the  observations,  whose  average 
values  are 

ave{A^  =  +  0.04  -  0.02  +  0.01 

ave  =  c^  +  0.06  +  0,04  +  0.02 

ave^G^  =  -  0.08  a|  -  0.05  -  0.03 

So  long  as  we  insist  on  using  only  a  single  quadratic  function 

of  the  observations,  the  choice  of  A,  whose  average  value  is 
least  affected  by  ,  and  c|  has  a  real  advantage.  But 

if  we  were  willing  to  look  at  two  quadratic  functions  of  the 
observations  together,  then  B  and  C  are  a  more  effective 
choice,  at  least  so  far  as  average  values  go.  For,  on  the 
average,  one  is  raised  by  the  other  variance  components,  while 
the  other  is  lowered.  If,  for  example,  the  observations  are 
replicated  m  times,  so  that  there  are  m  A's,  m  B's,  and  m  C's, 
and  so  that,  consequently, 

B  +  tsB//m 


is  an  upper  confidence  limit  for  ave  B,  while 
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C  -  ts^X/m 

is  a  lower  confidence  limit  for  ave  C,  then  the  interval 

(C-ts^/v/m ,  B  +  tsBX/m) 

O 

is  a  confidence  interval  for  a,,  without  regard  for  the 
values  of  ct|,  o^,  and  a^.  (No  such  confidence  interval  can 
be  based  upon  the  m  values  of  A.)  Whenever  we  cannot  get 
estimates  (of  what  we  want  to  estimate)  whose  average  values 
are  wholly  free  of  what  we  do  not  want  to  estimate ,  the  use 
of  such  paired  estimates,  one  underestimating  and  the  other 
overestimating,  is  likely  to  be  useful  and,  perhaps,  even 
necessary. 

When  we  make  estimates  of  spectrum  densities,  the 
window  which  relates  the  average  value  of  our  estimate  to 
the  spectrum  is  (for  the  apparently  inescapable  case  of 
equally-spaced  data)  inevitably  a  cosine  polynomial  (of 
degree  no  larger  than  the  index  of  the  longest  lag  used) . 

It  can  vanish  at  only  a  finite  number  of  points.  Consequently 
its  main  lobe,  which  points  out  the  band  of  frequencies  over 
which  we  seek  to  estimate  some  average  spectrum  density,  is 
inevitably  accompanied  by  minor  lobes  which  allow  leakage 
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from  the  parts  of  the  spectrum  outside  the  desired  band  to 
affect  the  average  value  of  our  estimate ,  and  hence  to  affect 
its  individual  values.  Even  if  we  are  willing  to  accept  the 
blurring  due  to  averaging  within  the  major  lobe,  as  we  must, 
like  it  or  not,  we  are  rightly  reluctant  to  face  unknown 
possibilities  of  leakage  from  other  parts  of  the  spectrum. 

The  cure  is  the  same  as  for  the  example  with  four  variance 
components:  use  two  estimates .  (This  time  one  estimate 

should  have  all  minor  lobes  negative  while  the  other  has  all 
miner  lobes  positive.)  This  general  situation  is  discussed 
more  fully  elsewhere  [Tukey  196l(?)],  and  it  is  to  be  hoped 
that  some  suitable  pairs  of  estimates  will  soon  be  explicitly 
available.  (For  one  pair  see  Wonnacott  1961.) 

Kinds  of  asymptosls 

The  purpose  of  asymptotic  theory  in  statistics  is 
simple:  to  provide  usable  approximations  before  passage  to 

the  limit.  Consequently  asymptotic  results  and  asymptotic 
problems  are  likely  to  be  of  limited  utility  when  the  finite¬ 
ness  of  a  sample  size  or  of  some  other  quantity  is  of  over¬ 
whelming  importance.  (Thus,  for  example,  the  theorem  that 
maximum  likelihood  estimates  are  asymptotically  normally 
distributed  with  a  certain  variance-covariance  matrix  is 
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rarely  of  any  use  when  there  are  only  1  or  2  degrees  of 
freedom  for  error.)  It  is  sometimes  hard,  but  almost 
always  Important,  to  remember  this  fact. 

Time  series  analysis  follows  its  usual  pattern, 
"like  most  statistical  areas,  only  more  so.' ",  insofar  as 
asymptosis  is  concerned.  For  there  are  three  distinct  ways 
in  which  time  series  data  could  tend  toward  a  simplifying 
limit ; 

(1)  The  total  extent  of  all  the  stretches 
of  data  available  could  become  more  nearly  in¬ 
finite  . 

(2)  The  extent  of  each  individual  stretch 
of  data  could  become  more  nearly  infinite. 

(3)  The  bandwidth  of  the  measurement  could 
become  more  nearly  infinite  (requiring  a  more 
nearly  vanishing  interval  between  times  of  re¬ 
cording)  . 

The  consequences  of  these  three,  which  are  quite  distinct, 
depend  upon  whether  the  resolution  of  the  estimates  to  be 
made  (a)  remains  constant,  (b)  increases  as  fast  as  the 
total  extent,  extent,  or  bandwidth  of  the  data,  or  (c) 
behaves  in  an  intermediate  manner. 
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If  (l)  occurs  without  (2)  or  (3) ,  the  possible 
resolution  does  not  increase ,  so  that  (a)  is  the  only  rele¬ 
vant  situation.  The  stability  of  individual  estimates  of 
(averaged)  spectrum  density  then  increases  essentially  pro¬ 
portionally  to  the  total  extent  of  data. 

If  (2)  proceeds,  (l)  must  also.  If  (2)  and  (l) 
proceed  without  (3) ,  the  range  of  (aliassed)  frequencies  to 
be  considered  will  not  change,  so  that  a  constant  number  of 
estimates  corresponds  to  constant  resolution,  and  to  an  in¬ 
crease  in  stability  essentially  proportional  to  total  extent 
of  data.  If,  on  the  other  hand,  the  resolution  is  increased 
proportionally  to  the  total  extent  of  data,  the  stability  of 
individual  estimates  will  remain  constant. 

If  (3)  proceeds  without  (l)  or  (2) ,  we  may  make 
estimates  over  a  wider  and  wider  frequency  range,  but  we 
cannot  obtain  higher  and  higher  resolution.  For  constant 
resolution,  we  obtain  constant  stability. 

In  practice,  where  there  are  several  repetitions, 
several  stretches  of  data,  it  may  be  that  we  can  wisely 
treat  the  total  extent  of  all  data  stretches  asymptotically 
(especially  when  the  additional  variability  in  external  error 
should  be  considered) ,  but  I  know  of  no  single  practical 
Instance  where  an  asymptotic  treatment  of  either  stretch 


length  or  band- limitation  gives  useful  results. 
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The  limitation  on  ultimate  resolution  due  to 
limited  extent  of  data  stretches,  and  the  limitation  on 
frequency  ranges  for  which  estimates  can  he  made  due  to 
hand-limiting,  always  seem  to  hehave  like  small-sample 
phenomena,  and  must  be  faced  in  detail.  They  do  not  at 
all  hehave  like  large-sample  phenomena,  where  everything 
can  be  "smoothed  out"  and  treated  in  a  limiting,  continuous 
way. 


V 

THE  MORAL 

To  analyze  time  series  effectively  we  must  do  the  c 
same  as  in  any  other  area  of  statistical  technique:  "Fear 
the  Lord  and  Shame  the  Devil"  by  admitting  that: 

(1)  The  complexity  of  the  situation  we 
study  is  greater  than  the  complexity  of  that 
description  of  it  offered  by  our  estimates. 

(2)  Balancing  of  one  ill  against  another 

in  choosing  the  way  data  is  either  to  be  gathered 
or  to  be  initially  analyzed  always  requires  knowl¬ 
edge  of  quantities  which  cannot  be  merely  hypoth¬ 
esized,  and  which,  in  many  cases,  we  cannot  usefully 
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estimate  from  a  single  body  of  data,  such  as 
ratios  of  (detailed)  variance  components  or 
extents  of  non-normality.  Theoretical  opti¬ 
mizations  based  upon  specific  values  of  such 
quantities  may  be  useful  guides,  but  only 
when  the  failure  of  past  experience  (and  the 
present  data)  to  give  precise  values  for 
these  quantities  is  recognized  and  allowed 
for  „ 

(3)  There  is  no  substitute  for  some  sort 
of  repetition  as  a  basis  for  assessing  stability 
of  estimates  and  establishing  confidence  limits. 

(4)  Asymptotic  theory  must  be  a  tool,  and 
not  a  master . 

The  only  difference  is  that  one  must  be  far 
more  conscious  of  these  acceptances  in  time  series  analysis 
than  in  most  other  statistical  areas. 

In  a  single  sentence,  the  moral  is:  ADMIT  THAT 
COMPLEXITY  ALWAYS  INCREASES,  FIRST  FROM  THE  MODEL  YOU  FIT 
TO  THE  MODEL  YOU  USE  TO  THINK  AND  PLAN  ABOUT  THE  EXPERIMENT, 
AND  THENCE  TO  THE  TRUE  SITUATION. 
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VI 

THREE  MYSTERIES 

Up  to  this  point ,  we  have  been  concerned  with  the 
fundamentals  of  time  series  analysis  and  with  the  close  and 
cogent  analogies  between  time  series  analysis  and  other  areas 
of  statistics „  As  a  consequence  our  remarks  have  related 
most  closely  to  the  first  of  the  two  papers.  It  is  now  time 
to  turn  to  the  second  paper ,  which  grapples  with  some  of  the 
more  detailed  aspects  of  time  series  analysis.  Here  it  seems 
best  to  try  to  shed  light  on  a  few  of  the  aspects  which  are 
likely  to  seem  most  mysterious.  Our  attention  will  be  given 
to  the  mysterious  importance  of  dividing  sums  of  lagged 
products  by  n  rather  than  by  n-k,  to  the  mystery  of  how  new 
window  patterns  are  sought,  and  to  the  mysterious  Importance 
of  choosing  a  window. 

Does  the  divisor  matter? 

The  major  computational  effort,  as  measured  in 
millions  of  multiplications  or  minutes  of  machine  time,  of 
any  conventional  careful  spectral  analysis  is  expended  on 
calculation  of  the  sums  of  lagged  products 

Z(k)  =  i=l  XiXi+k 


the 
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(if  these  are  calculated  for  k=0,  1,  2,  .  ..,  m,  some 
(m-KL)n  -  m(m-l)/2  ~  m-n  multiplications  will  be  required.) 

The  X,  In  this  calculation  will  be  raw,  or  ?:>rewhitened,  or 

_L 

otherwise  modified  observations,  from  which  means,  fitted 
polynomials,  or  other  fitted  trends  may  or  may  not  have 
been  subtracted.  Unless  unusually  careful  preparatory  steps 
for  the  elimination  of  very  low  frequencies  were  already 
taken  in  the  preparation  of  the  ,  the  next  step  after 
calculating  these  sums  of  lagged  products  will  be  adjustment 
of  these  sums  of  lagged  products  for  means  or  trends.  It  is 
vital  to  deal  in  practice  with  such  adjusted  sums  of  lagged 
products ,  as  almost  everyone  who  enters  upon  time  series 
analysis  seems  to  have  to  learn  for  himself.  (However,  it 
will  save  space  and,  hopefully,  promote  clarity  if  we  omit 
the  word  "adjusted"  during  the  remainder  of  this  discussion. 
We  shall  omit  it.)  Having  been  told  of  sums  of  lagged 
products,  every  analyst  of  variance  expects  us  to  go  on  to 
mean  lagged  products.  Going  on  is  inevitable. 

There  is  a  question  of  the  appropriate  divisor. 

If  we  had  not  corrected  for  the  mean  (or  any  trend)  there 
are  cases  to  be  made  for  both  n  and  n-k.  If  we  had  corrected 
for,  say,  a  general  linear  trend  (which  absorbs  2  degrees  of 
freedom),  there  are  cases  to  be  made  for  n,  for  n-2,  for  n-k 
and  for  n-k-2.  Parzen  gives  attention,  between  his  (4.6)  and 
(4.7),  to  some  of  the  reasons  for  choosing  n  or  n-2  rather 
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than  n-k  or  n-k-2.  B y  analogy  with  the  analysis  of  vari¬ 
ance  we  might  feel  that  n-k-2  (or.,  when  no  adjustment  is 
made,  n-k)  would  be  desirable  because  unbiasedness  is  good. 

The  unbiasedness  argument  is  found  not  to  be  a  strong  one 
in  the  time  series  situation. 

Is  this  choice  an  important  one  for  the  analyst 
or  investigator  whose  concern  is  with  the  spectrum?  You 
should  be  happy  to  be  told  that  the  answer  is  "no".  If 
one's  concern  is  with  the  spectrum,  then  the  most  important 
thing  about  any  quadratic  function  of  the  observations  is 
the  spectrum  window  which  expresses  the  average  value  of 
the  estimate  in  terms  of  the  spectrum  of  the  ensemble. 

(The  next  most  important  thing  is,  of  course,  the  variability 
of  the  quadratic  function.)  This  is  just  what  we  should 
expect  for  a  variance -component  problem,  where  means  and 
other  linear  combinations  of  the  observations  are  without 
direct  interest.  For  if,  in  some  very  complex  (probably 
unbalanced  to  begin  with,  and  then  peppered  with  missing 
plots)  analysis  of  variance,  one  is  given  the  values  of 
certain  mean  squares  (or  other  quadratic  functions  of  the 
observations) ,  the  first  question  one  concerned  with  vari¬ 
ance  components  asks  is  "How  are  the  average  values  of  these 
mean  squares  expressible  in  terms  of  our  variance  components?". 
(The  question  about  stability  "How  many  degrees  of  freedom 
should  be  assigned  to  each?"  is  important  but  secondary.) 
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If  we  know  the  windows  associated  with  our  spectrum  esti¬ 
mates  ,  we  need  not  be  concerned.  In  the  first  Instance,  with 
how  these  estimates  were  obtained.  And,  moreover,  any  linear 
combination  of  the  results  of  dividing  the  sums  of  lagged 
products  by  n  Is  also  a  linear  combination  of  the  results  of 
dividing  the  sums  of  lagged  products  by  n-k,  and  vice  versa. 

The  practicing  spectrum  analyst  need  not  be  con¬ 
cerned  with  division  by  n  or  n-k,  so  long  as  he  doesn't  mis- 
assemble  formulas  by  combining  some  which  are  appropriate 
for  one  divisor  with  others  appropriate  for  the  other. 

However,  those  interested  in  the  theory  of  spectrum 
analysis  do  need  to  give  some  attention  to  this  choice, 
partly  because  of  the  reasons  given  by  Parzen,  partly  because 
this  chcice  affects  just  what  functions  of  frequency  the  mean 
lagged  products  are  Fourier  transforms  of,  partly  for  various 
other  reasons.  The  man  who  has  a  practical  interest  in  the 
autocovariance  function,  if  there  really  be  such,  clearly 
also  has  to  take  an  interest  in  alternative  estimates. 

Unlikely  though  it  may  seem  at  first ,  there  is  a 
moderately  close  analogy  between  the  biased  estimates  supported 
by  Parzen  and  biased  estimates  which  are  reasonable  in  classical 
analysis  of  variance.  Consider  data  in  a  single  classification 
with  r  observations  in  each  class,  sc  that  the  between  mean 

O  O  p 

square  has  average  value  a  +  rar-,  where  is  the  error 

2 

variance  component 5  and  o ^  is  the  between  variance  component 0 
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If  we  wish  to  estimate  the  population  average  corresponding 
to  a  particular  classification ,  there  is  little  doubt  that 
the  sample  mean  for  that  classification  is  the  most  reason¬ 
able  estimate.  But  if  we  wish  to  depict  the  pattern  of  the 
population  averages  corresponding  to  all  classifications,  we 
should  do  something  about  the  inflation  of  this  pattern  by 
error  variance we  should  replace  the  pattern  of  observed 
means  by  a  suitably  shrunken  pattern.  (in  the  simplest  cases 
it  may  suffice  to  shrink  each  classification  mean  toward  the 
grand  mean  by  the  factor  [ra^/(a2+ro^) l1/2.  In  others  the 
method  developed  by  Eddington  for  dealing  with  stellar 
statistics  [Trumpler  and  Weaver  1953j  PP  101-104]  may  need 
to  be  applied.)  The  analogy  with  the  time  series  case  is 
reasonably,  in  fact  surprisingly,  close.  If  we  wanted  to 
estimate  just  one  autocovariance,  we  should  undoubtedly  use 
the  unbiased  estimate.  But  if  we  are  concerned  with  the 
pattern  made  by  the  estimated  values,  with  the  nature  of 
the  autocovariance  function,  we  may,  as  Parzen  points  out, 
do  better  to  use  the  biased  estimate. 

(The  extreme  instance  of  the  problem  underlying 
this  choice  in  the  time  series  case  arises  when  one 
5-minute  record  is  '’cross-correlated"  [really  cross- 
covarianced]  with  another  5-uinute  stretch  of  the  same  time 
series,  as  recorded  an  hour,  a  day,  or  a  week  later.  If 
the  spectrum  of  the  ensemble  Is  relatively  sharp,  the  average 
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value  of  the  covariance  will  still  tend  to  zero,  but  the 
average  value  of  Its  square  will  tend,  not  to  zero,  but 
to  a  value  depending  upon  the  product  of  the  5-minute 
duration  with  the  width  of  the  spectral  peak.  Thus  If 
one  calculates  autocovariances  at  lags  from  24  hours 
0  minutes  to  24  hours  5  minutes  one  will  almost  certainly 
find  an  apparently  systematic  wavy  pattern  In  the  unbiased 
estimates  of  autocovariances  or  autocorrelations  computed 
for  a  particular  realization.  It  Is  natural  to  believe 
that  this  pattern  Is  !'realn,  although  the  true  average 
values  of  the  autocovariances  are  actually  very,  very  much 
smaller  In  magnitude  than  the  values  found  from  a  single 
realization.  Such  patterns  can  be  so  regular  as  to  mislead 
Investigators  Into  an  unwarranted  belief  that  the  presence 
of  a  strikingly  accurate  underlying  clock  has  been  demon¬ 
strated.  ) 

How  can  I  construct  a  window? 

If  we  leave  aside  a  few  matters  which  really  do 
not  matter  here,  although  some  of  them  are  very  Important 
elsewhere  (such  as  adjustment  for  the  mean,  other  devices 
for  rejection  of  very  low  frequencies,  and  division  by  n-k 
not  n) ,  the  function  of  lag  by  which  the  mean  lagged  products 
are  multiplied  before  Fourier  transformation,  and  the  window 
(expressed  in  terms  of  o)-o)o  and  o>4-a>o  separately,  where 
is  the  center  frequency  of  the  estimate)  through  which  the 
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spectrum  determines  the  average  value  of  the  estimate,  are 
Fourier  transforms  of  one  another.  (if  you  have  never 
followed  a  derivation  of  this,  just  take  it  on  faith.) 

Since  every  lag  must  be  a  multiple  of  the  data  interval, 
one  of  these  functions  is  a  finite  array  of  spikes,  spaced 
one  data  interval  apart.  The  other  function  is  a  polynomial 
in  cos(o3-odo)  of  an  appropriate  degree. 

While  the  discreteness  of  time  is  generally  an 
important  aspect  of  the  data,  it  is  not  important  for  our 
present  purposes ,  so  that  we  may  replace  the  spiky  lag 
window  by  a  smooth  function  of  a  continuous  variable  without 
altering  its  Fourier  transform  in  any  way  which  is  essential 
to  the  present  discussion.  (Provided  that  we  began  with, 
say,  at  least  10-20  spikes.)  Since  we  are  going  to  calcu¬ 
late  mean  lagged  products  for  only  a  finite  number  of  lags, 
this  continuous  lag  window  must  vanish  outside  a  finite 
interval.  If  it  were  possible,  we  would  like  to  have  its 
Fourier  transform,  the  corresponding  spectrum  window,  also 
vanish  outside  a  finite  interval,  for  then  the  average  value 
of  the  corresponding  spectrum  estimate  would  only  involve 
contributions  from  a  restricted  part  of  the  spectrum. 

It  is,  however,  well  known  that  a  function  and  its 
Fourier  transform  cannot  both  vanish  outside  finite  intervals. 
Indeed,  they  cannot  both  go  to  zero  too  rapidly  as  their 


arguments  tend  to  infinity.  The  standard  example  of  a 


function  which ,  together  with  its  Fourier  transform,  goes 
to  zero  rapidly  at  infinity  is  the  standard  normal  density 
function,  which  together  with  its  Fourier  transform,  goes 
to  zero  as  the  negative  exponential  of  half  the  square  of 
its  argument.  Unfortunately,  we  cannot  make  use  of  the 
normal  density  as  a  lag  window,  because  it  does  not  vanish 
outside  a  finite  interval. 

Every  statistician  knows,  however  (or  so  the 
phrase  goes) ,  how  to  approximate  a  normal  distribution  by 
a  bounded  distribution.  It  is  only  necessary  to  consider 
the  distribution  of  means  of  simple  random  samples  from  any 
bounded  parent  distribution.  And  what  parent  distribution 
could  be  simpler  than  the  rectangular  (uniform)  distribu¬ 
tion?  If  we  take  samples  of  size  k,  the  Fourier  transform 
of  the  distribution  of  means  will  be  of  the  form  (sin  u/u)k 
where  u  is  a  multiple  of  q-o>o,  depending  upon  k  and  the 
number  of  lags  used.  The  larger  is  k,  the  smaller  are  the 
minor  lobes  of  this  window  In  comparison  with  the  main  lobe 
and  the  more  lags  are  required  to  give  a  main  lobe  of  pre¬ 
scribed  narrowness.  If  k=l,  which  corresponds  to  a  raw 
Fourier  transform  of  the  mean  lagged  products  ,  the  minor 
lobes  adjacent  to  the  main  lobe  are  about  1/3  the  height 
of  the  main  lobe  (and  negative) ,  which  proves  to  be  im¬ 
practical.  If  k=2 ,  which  corresponds  to  line  1  in  Parzen '  s 
Table  1,  the  minor  lobes  are  at  most  l/9  the  height  of  the 
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main  lobe,  and  the  resulting  spectral  window,  often  called 
the  Bartlett  window,  is  everywhere  positive „  If  k=4,  which 
corresponds  to  line  8  in  Parzen*s  Table  1,  and  to  h^Cu)  in 
his  Table  2,  the  minor  lobes  are  at  most  l/8l  the  height 
of  the  main  lobe,  and  the  resulting  spectral  window,  as 
Parzen  shows,  is  quite  effective,, 

It  would  be  perfectly  possible  to  use  k=8  or  k=l6 
if  we  wished  even  lower  minor  lobes „  The  cost  to  us  of  doing 
this  would  be  twofold.  There  would  have  to  be  an  increase 
in  computational  effort  in  order  to  provide  mean  lagged 
products  for  the  additional  lags  required  to  give  a  main 
lobe  of  comparable  width.  And  the  shapes  of  the  main  lobes 
would  be  somewhat  less  favorable,  since  the  process  of 
raising  the  window  to  a  higher  and  higher  power  will  make 
both  the  minor  lobes  and  the  lower  portions  of  the  main  lobe 
still  lower.  As  a  result  the  main  lobe  will  "occupy"  a 
smaller  and  smaller  part  of  the  frequency  band  between  the 
■zeroes  (of  the  window)  which  define  it,  and,  consequently, 
the  variability  of  the  corresponding  estimate  (leakage 
aside)  will  be  greater  than  that  of  an  estimate  with  a  more 
"blocky"  spectrum  window. 

As  is  clear  from  Parzen's  paper,  these  are  not  the 
onry  useful  lag  windows,  the  "cosine-arch"  or  "hamming''  lag 
window  which  is  proportional  to  'one  plus  cosine"  being  also 
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of  practical  Interest.  This  latter  window  was  "discovered” 
by  empirical  observation,  and  the  best  reason  for  considering 
it  are  the  properties  it  is  found  to  have. 

(Two  further  easily  understandable  types  of  window 
which  may  sometimes  prove  useful  may  be  obtained  respectively, 
(i)  by  taking  a  truncated  normal  distribution  as  the  lag  win¬ 
dow,  (ii)  by  taking  a  &ebys£v  polynomial  for  the  spectral 
window.  This  last  choice  makes  all  minor  lobes  of  equal 
height,  and  as  small  in  comparison  with  the  main  lobe  as  is 
possible  for  a  given  number  of  lags.  This  equality  of  height, 
which  makes  the  minor  lobes  adjacent  to  the  main  lobes  lower 
than  those  of  most  other  windows  but  makes  minor  lobes  far 
away  from  the  main  lobes  relatively  higher  than  those  of 
most  other  windows,  seems  to  prove  to  be  a  disadvantage 
rather  more  often  than  it  proves  to  be  an  advantage.) 

How  Important  is  window  choice? 

We  have  discussed  window  carpentry  briefly.  Now 
we  need  to  ask  what  does  it  buy  us ,  how  much  better  can  we 
do  with  a  specially  constructed  window  than  with  a  rather 
routine  one.  This  question  has  opposite  answers,  depending 
on  whether  one  relies  upon  his  window  to  do  everything  for 
him,  or  not. 

If  one  relies  solely  upon  windows,  faces  a  peaky 
or  steeply  slanting  spectrum,  and  is  concerned  with  the 
behavior  of  the  spectrum  where  the  density  is  noticeably 
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below  Its  highest  values,  then  the  quality  of  workmanship 
and  polish  of  the  window  used  can  easily  be  of  the  utmost 
importance.  (During  the  early  '50s  I  spent  considerable 
effort  on  a  variety  of  ways  to  improve  windows.  The  results 
have  never  been  published  because  it  turned  out,  as  will 
shortly  be  explained,  to  be  easier  to  avoid  the  necessity 
for  their  use.) 

If  one  applies  his  windows,  actually  or  effective¬ 
ly,  not  necessarily  to  the  original  data  but,  whenever  useful, 
to  the  results  of  simple  linear  modifications  of  the  original 
data,  chosen  so  as  to  depress  peaks,  to  raise  valleys,  and, 
where  necessary,  to  remove  narrow  peaks  (which  may  appear 
to  be  "lines") ,  he  will  rarely,  if  ever,  find  any  need  for 
anything  beyond  a  window  of  routinely  good  quality,  such  as 
the  hamming  or  cosine  arch  window  (or,  if  a  slight  increase 
in  variance  of  estimate  and  a  substantial  increase  in  com¬ 
putational  effort  are  worth  bearing,  the  k=4  window  described 
above).  (For  discussion  of  techniques  of  linear  modification 
see  Blackman  and  Tukey  1959?  Holloway  1938,  and,  perhaps, 
the  work  of  the  Labroustes  referred  to  by  Chapman  and  Bartels 
[1940,  p.  992]  and  Blackman  and  Tukey  [1959?  P»  l80]„)  In 
my  own  experience  this  sort  of  approach  to  the  problem, 
which  corresponds  [Blackman  and  Tukey  1959?  P •  42]  to  using 
different  window  shapes  in  different  frequency  bands,  is 
much  easier  than  seeking  out  explicit  forms  for  very  special 
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windows  to  meet  each  special  situation.  Moreover  [e.g. 
Blackman  and  Tukey  1959 ,  PP .  62-63 J  Tukey  1959,  pp.  315-316], 
consideration  of  this  technique  leads  to  very  helpful  in¬ 
sights  into  how  the  data  is  best  gathered  in  the  first  place. 

But  each  of  us  is  entitled  to  do  his  calculations 
as  he  pleases,  so  long  as  he  does  adjust  his  techniques  to 
provide  the  amounts  of  precision  and  stringency  his  problems 
require „ 


VII 

COMPUTATIONAL  CONSIDERATIONS 

It  is  important  to  say  something  about  the  role 
of  computational  efficiency  and  computational  choices  as 
considerations  in  time  series  analysis.  Computational 
considerations  are  particularly  important  in  time  series 
analysis,  in  part  because  of  the  relatively  large  amounts 
of  data  processed,  in  part  because  of  the  very  many  multi¬ 
plications  involved  in  obtaining  sums  of  lagged  products , 
and  in  part  for  more  subtle  reasons.  And  it  is  sometimes 
hard,  especially  for  the  novice,  to  separate  computational, 
statistical,  and  aims-and-purposes  considerations,  one  from 
another.  Yet  if  they  are  not  separated,  neither  sound 
practices  nor  sound  advice  can  be  understood  as  such,  rather 
than  being  taken  on  faith. 


62 


Computational  considerations  depend  very  much  on 
the  equipment  available.  Crude  spectral  analysis  is  possible 
with  paper  and  pencil  [Blackman  and  Tukey  1959*  PP .  151-169] , 
and  modestly  refined  computations  have  been  done  on  hand 
calculators.  The  beginning  of  effective  spectrum  calculation 
probably  Involves  the  use  of  punched-card  tabulators  to  obtain 
sums  of  lagged  products  (by  applying  progressive  digiting  to 
cards  obtained  by  off-set  reproduction  [Hartley  1946]  and  the 
conduct  of  all  further  computation  on  hand  calculators.  The 
steps  from  this  to  fully  automatized  spectrum  analysis  on 
machines  of  the  capacity  and  speed  of  an  IBM  7090  or  CDC  1604 
are  many  and  long.  The  reluctance  or  eagerness  with  which 
one  faces  another  hundred  thousand  multiplications  depends 
very  strikingly  on  the  equipment  available. 

And,  consequently,  so  does  one's  attitude  toward 
using  many  more  lags  to  improve  window  shape  or  increase 
resolution,  or  toward  recomputing  mean  lagged  products  when¬ 
ever  new  spectrum  estimates  (estimates  differing  in  resolution, 
in  window  shape,  in  prewhitening,  or  in  rejection  filtration) 
are  to  be  obtained  from  the  same  data.  In  the  economy  of 
abundance  which  goes  with  modern  electronic  computers,  I 
prefer  to  recompute  mean  lagged  products  when  a  new  set  of 
spectrum  estimates  are  required,  but  others  feel  quite 
differently.  Some  of  the  reasons  for  this  difference  can  be 
made  manifest,  and  their  mention  may  serve  to  illuminate  a 
variety  of  computational  issues. 
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To  recompute  or  not  to  recompute? 

First,  recomputation  when  necessary  allows  the  use 
of  packaged,  unified  machine  programs,  which  require  only 
values  for  a  few  constants  and  the  data  in  order  to  provide 
the  desired  spectrum  estimates.  This  makes  it  much  easier 
for  those  unsophisticated  in  time  series  analysis,  whether 
investigators  or  technical  aides ,  to  process  data  more  easily 
and  effectively.  Most  data  analysis  is  going  to  be  done  by 
the  unsophisticated.  As  statisticians  we  have  a  responsibility 
to  package  as  many  techniques  as  possible  for  safe  and  effec¬ 
tive  use  by  those  who  will  analyze  data,  and  who  will  not 
understand  why  the  choices  in  the  package  were  made  wisely 
or  unwisely. 

Next,  and  perhaps  more  important  for  the  present, 
is  the  absence  of  adequate  facilities  for  data  analysis. 

There  is  no  data-analytic  language  analogous  to  FORTRAN  or 
ALGOL,  in  whose  terms  it  is  easy  to  describe  the  operations 
of  data-analysis ,  and,  what  is  far  more  crucial,  I  know  of 
no  large  machine  installation  whose  operations  are  adapted 
to  the  basic  step-by-step  character  of  most  data  analysis, 
in  which  most  answers  coming  out  of  the  machine  will,  after 
human  consideration,  return  to  the  machine  for  further 
processing.  Neither  programming  languages  or  computer  center 
operations  are  adapted  to  stepwise  operation,  and  all  of  us 
who  use  big  machines  for  data  analysis  are  thus  forced  to 
more  unified  operation  than  might  otherwise  be  desirable. 
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Third,  and  this  consideration  is  not  related  or 
restricted  to  big  machines ,  stepwise  computation  tends  to 
produce  stepwise  thinking.  I  believe  that  stepwise  thinking 
led  to  the  classical  Schuster  periodogram,  and  hence  to 
decades  of  ineffective  quiescence  for  frequency  oriented 
analysis  of  time  series.  The  individual  steps  from  data 
through  intermediate  results  to  periodogram  ordinates  seemded 
reasonable  each  by  itself.  And  while  Stumpff *s  book  recog¬ 
nized  the  nature  of  the  corresponding  spectral  window  before 
1940  [Stumpff  1937,  PP .  98-100],  nothing  was  done  to  provide 
more  useful  estimates  until  people  began  to  relate  average 
values  of  estimates  to  the  spectrum  of  the  ensemble  of 
which  the  data  is  one  realization.  What  security  we  can 
have  in  frequency -oriented  time-series  analysis  comes  from 
over-all  thinking,  while  many  of  the  most  threatening 
dangers  come  from  step-by-step  thinking.  Thus  we  often  do 
very  much  better  to  apply  over-all  processes  (which  have 
been  thought  through  cver-all,  not  merely  stepwise)  to  data 
than  to  apply  the  individual  steps  separately.  This  view 
does  not  deny  the  great  desirability  of  "try,  look,  and  try 
something  a  little  different"  as  the  typical  pattern  of  data 
analysis.  It  merely  asks  that  each  trial,  unless  it  is 
extremely  exploratory,  be  thought  through  as  a  unit.  It 
does  not  even  say  that  it  is  unwise  to  calculate  sums  of 
lagged  products  once  and  for  all.  It  only  calls  on  those 
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who  do  so  to  be  sure  that  the  total  processes  they  apply  to 
data  have  been  thought  through  as  wholes.  It  does,  however, 
note  that  using  preplanned  packages  Increases  the  chances 
that  such  thinking  will  have  been  done. 

Precision  may  matter 

Finally,  there  is  a  question  of  required  precision 
of  arithmetic.  Let  us  approach  this  somewhat  indirectly. 

In  friendly  conversation,  James  Durbin  recently  brought 
firmly  to  my  attention  that  there  was  an  alternative  to  first 
prehitening  the  observations  and  then  calculating  sums  of 
lagged  products  for  these  modified  values,  remarking  that 
one  might,  instead  calculate  rather  more  sums  of  lagged 
products  for  the  original  observations,  and  then  calculate 
the  suitable  simple  linear  combinations  of  these  sums  which 
would  be  identically  equal  to  the  sums  of  lagged  products 
for  the  modified  observations.  This  remark  is  surely  well 
taken.  The  results  are  algebraically  Identical.  And  If 
spectrum  estimates  for  the  results  of  enough  different 
prewhitenings  of  the  same  data  are  going  to  be  required, 
then  the  computational  path  suggested  by  Durbin  will  surely 
have  real  advantages.  But  It  behooves  us  equally  to  con¬ 
sider  the  possible  disadvantages  of  this  alternate  approach. 
Perhaps  the  greatest  of  these  is  the  likely  requirement  of 
greater  precision  of  arithmetic  (although  it  Is  interesting 
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to  note  that,  if  only  one  set  of  spectrum  estimates  is  to 
be  calculated,  prewhitening  first  will  even  save  some 
multiplications) . 

This  statement  about  accuracy  sounds  a  little 
peculiar  at  first  to  one  familiar  with  more  classical  statis¬ 
tical  computations,  but  when  he  recalls  the  advantages  of 
postponing  divisions  in  calculating  sums  of  squares  of 
deviations  (and  in  more  general  analysis  of  variance  com¬ 
putations)  he  becbmes  aware  of  the  practical  inequivalence 
of  algebraically  identical  forms  of  computation. 

An  adequately  prewhitened  time  series,  at  least 
one  that  is  a  realization  from  an  ensemble  which  produces 
spectrum  estimates  which  are  even  a  quarter  as  variable  as 
those  provided  by  a  Gaussian  ensemble  (most  ensembles 
arising  in  practice  will  produce  estimates  more  variable 
than  those) ,  requires  the  observations  to  be  recorded  to, 
at  most,  only  the  precision  offered  by  1.5  to  2  decimal 
digits  [Tukey  1959b,  pp.  319-320],  But  one  that  is  far  from 
adequately  prewhitened  may  require  several  decimal  digits. 
This  happens  because  the  spread  between  the  maximum  and 
minimum  observations  is  determined  by  the  (areas  of)  peaks 
in  the  spectrum,  while  the  precision  necessary  to  avoid 
serious  loss  of  information  about  the  spectrum  is  deter¬ 
mined  by  the  depths  of  its  valleys. 
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A  similar  difficulty  can  arise  in  so  simple  a 
situation  as  fitting  a  quadratic  polynomial  *  though  there 
most  statisticians  would  see  the  difficulty  coming  and 
evade  it.  Thus  if 

y±  =  12.71  +  1,000,000  x±  +  0.03(x^-l/3)  +  e± 

-5 

where  x^^  ranges  from  -1  to  +1,  var  =  1C  ,  and  we  seek 
to  find  the  quadratic  term  by  ordinary  quadratic  regression, 
it  will  not  suffice  to  use  y-values  with  only  7-decimal 
digits  of  precision,  because  rounding  to  units  introduces 
deviations  of  up  to  0.50  (which  is  large  compared  to  the 
maximum  quadratic  effect  of  +0.02)  and  increases  the  effec¬ 
tive  error  variance  by  a  factor  of  more  than  8000. 

Similarly,  in  the  time  aeries  case,  if  one  is  not 
prepared  to  prewhiten  first,  when  desirable,  it  is  necessary 
to  make  provision  for  moderate  to  high  precision”in  input 
data,  and  correspondingly  higher  precision  i n  accumulating 
sums  of  lagged  products.  The  most  likely  result  is  a  program 
which  computes  sums  of  lagged  products  in  double -precision 
arithmetic ,  perhaps  even  floating-point  double-precision 
arithmetic.  This  means  extra  effort  at  many  stages  of  the 
computation. 
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No  one  of  these  four  considerations  rule  out 
calculating  suns  of  lagged  products  once  and  for  all,  but 
each  exerts  pressure.  The  combined  effect  influences  me 
very  much,  but  I  must  admit  that  they  might  not  be  as 
potent  if  the  calculations  with  which  I  was  concerned  were 
to  be  made  on  quite  other  computing  equipment. 


VIII 

OTHER  INTRODUCTORY  REFERENCES 

Where  is  the  statistician  to  seek  further  en¬ 
lightenment  about  spectral  analysis?  It  is  hard  to  give 
extensive  lists  of  highly  informative  sources,  bu  some 
guidance  may  be  helpful. 

One  useful  route  for  many  statisticians  will  be 
to  turn  to  instances  where  the  technique  has  been  applied. 
A  list  of  references  to  recent  applications  can  be  found 
in  either  Tukey  1959a  (pp.  4o8-4ll)  or  Tukey  1959b 
(pp.  327-330).  These  lists  unfortunately  omitted  the 
1957  Symposium  at  the  Royal  Statistical  Society  on  the 
Analysis  of  Geophysical  Time  Series  [Craddock  1957, 
Charnock  1957,  Rushton  and  Neumann  1957,  and  discussion], 
where  further  references  to  geophysical  applications  can 
be  found. 
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